{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第16章 主成分分析"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.假设$x$为$m$ 维随机变量，其均值为$\\mu$，协方差矩阵为$\\Sigma$。\n",
    "\n",
    "考虑由$m$维随机变量$x$到$m$维随机变量$y$的线性变换\n",
    "$$y _ { i } = \\alpha _ { i } ^ { T } x = \\sum _ { k = 1 } ^ { m } \\alpha _ { k i } x _ { k } , \\quad i = 1,2 , \\cdots , m$$\n",
    "\n",
    "其中$\\alpha _ { i } ^ { T } = ( \\alpha _ { 1 i } , \\alpha _ { 2 i } , \\cdots , \\alpha _ { m i } )$。\n",
    "\n",
    "如果该线性变换满足以下条件，则称之为总体主成分：\n",
    "\n",
    "（1）$\\alpha _ { i } ^ { T } \\alpha _ { i } = 1 , i = 1,2 , \\cdots , m$；\n",
    "\n",
    "（2）$\\operatorname { cov } ( y _ { i } , y _ { j } ) = 0 ( i \\neq j )$;\n",
    "\n",
    "（3）变量$y_1$是$x$的所有线性变换中方差最大的；$y_2$是与$y_1$不相关的$x$的所有线性变换中方差最大的；一般地，$y_i$是与$y _ { 1 } , y _ { 2 } , \\cdots , y _ { i - 1 } , ( i = 1,2 , \\cdots , m )$都不相关的$x$的所有线性变换中方差最大的；这时分别称$y _ { 1 } , y _ { 2 } , \\cdots , y _ { m }$为$x$的第一主成分、第二主成分、…、第$m$主成分。\n",
    "\n",
    "2.假设$x$是$m$维随机变量，其协方差矩阵是$\\Sigma$，$\\Sigma$的特征值分别是$\\lambda _ { 1 } \\geq\\lambda _ { 2 } \\geq \\cdots \\geq \\lambda _ { m } \\geq 0$，特征值对应的单位特征向量分别是$\\alpha _ { 1 } , \\alpha _ { 2 } , \\cdots , \\alpha _ { m }$，则$x$的第2主成分可以写作\n",
    "\n",
    "$$y _ { i } = \\alpha _ { i } ^ { T } x = \\sum _ { k = 1 } ^ { m } \\alpha _ { k i } x _ { k } , \\quad i = 1,2 , \\cdots , m$$\n",
    "并且，$x$的第$i$主成分的方差是协方差矩阵$\\Sigma$的第$i$个特征值，即$$\\operatorname { var } ( y _ { i } ) = \\alpha _ { i } ^ { T } \\Sigma \\alpha _ { i } = \\lambda _ { i }$$\n",
    "\n",
    "3.主成分有以下性质：\n",
    "\n",
    "主成分$y$的协方差矩阵是对角矩阵$$\\operatorname { cov } ( y ) = \\Lambda = \\operatorname { diag } ( \\lambda _ { 1 } , \\lambda _ { 2 } , \\cdots , \\lambda _ { m } )$$\n",
    "\n",
    "主成分$y$的方差之和等于随机变量$x$的方差之和\n",
    "$$\\sum _ { i = 1 } ^ { m } \\lambda _ { i } = \\sum _ { i = 1 } ^ { m } \\sigma _ { i i }$$\n",
    "其中$\\sigma _ { i i }$是$x_2$的方差，即协方差矩阵$\\Sigma$的对角线元素。\n",
    "\n",
    "主成分$y_k$与变量$x_2$的相关系数$\\rho ( y _ { k } , x _ { i } )$称为因子负荷量（factor loading），它表示第$k$个主成分$y_k$与变量$x$的相关关系，即$y_k$对$x$的贡献程度。\n",
    "$$\\rho ( y _ { k } , x _ { i } ) = \\frac { \\sqrt { \\lambda _ { k } } \\alpha _ { i k } } { \\sqrt { \\sigma _ { i i } } } , \\quad k , i = 1,2 , \\cdots , m$$\n",
    "\n",
    "4.样本主成分分析就是基于样本协方差矩阵的主成分分析。\n",
    "\n",
    "给定样本矩阵\n",
    "$$X = \\left[ \\begin{array} { l l l l } { x _ { 1 } } & { x _ { 2 } } & { \\cdots } & { x _ { n } } \\end{array} \\right] = \\left[ \\begin{array} { c c c c } { x _ { 11 } } & { x _ { 12 } } & { \\cdots } & { x _ { 1 n } } \\\\ { x _ { 21 } } & { x _ { 22 } } & { \\cdots } & { x _ { 2 n } } \\\\ { \\vdots } & { \\vdots } & { } & { \\vdots } \\\\ { x _ { m 1 } } & { x _ { m 2 } } & { \\cdots } & { x _ { m n } } \\end{array} \\right]$$\n",
    "\n",
    "其中$x _ { j } = ( x _ { 1 j } , x _ { 2 j } , \\cdots , x _ { m j } ) ^ { T }$是$x$的第$j$个独立观测样本，$j=1,2，…,n$。 \n",
    "\n",
    "$X$的样本协方差矩阵\n",
    "$$\\left. \\begin{array} { c } { S = [ s _ { i j } ] _ { m \\times m } , \\quad s _ { i j } = \\frac { 1 } { n - 1 } \\sum _ { k = 1 } ^ { n } ( x _ { i k } - \\overline { x } _ { i } ) ( x _ { j k } - \\overline { x } _ { j } ) } \\\\ { i = 1,2 , \\cdots , m , \\quad j = 1,2 , \\cdots , m } \\end{array} \\right.$$\n",
    "\n",
    "给定样本数据矩阵$X$，考虑向量$x$到$y$的线性变换$$y = A ^ { T } x$$\n",
    "这里\n",
    "$$A = \\left[ \\begin{array} { l l l l } { a _ { 1 } } & { a _ { 2 } } & { \\cdots } & { a _ { m } } \\end{array} \\right] = \\left[ \\begin{array} { c c c c } { a _ { 11 } } & { a _ { 12 } } & { \\cdots } & { a _ { 1 m } } \\\\ { a _ { 21 } } & { a _ { 22 } } & { \\cdots } & { a _ { 2 m } } \\\\ { \\vdots } & { \\vdots } & { } & { \\vdots } \\\\ { a _ { m 1 } } & { a _ { m 2 } } & { \\cdots } & { a _ { m m } } \\end{array} \\right]$$\n",
    "\n",
    "如果该线性变换满足以下条件，则称之为样本主成分。样本第一主成分$y _ { 1 } = a _ { 1 } ^ { T } x$是在$a _ { 1 } ^ { T } a _ { 1 } = 1$条件下，使得$a _ { 1 } ^ { T } x _ { j } ( j = 1,2 , \\cdots , n )$的样本方差$a _ { 1 } ^ { T } S a _ { 1 }$最大的$x$的线性变换；\n",
    "\n",
    "样本第二主成分$y _ { 2 } = a _ { 2 } ^ { T } x$是在$a _ { 2 } ^ { T } a _ { 2 } = 1$和$a _ { 2 } ^ { T } x _ { j }$与$a _ { 1 } ^ { T } x _ { j } ( j = 1,2 , \\cdots , n )$的样本协方差$a _ { 1 } ^ { T } S a _ { 2 } = 0$条件下，使得$a _ { 2 } ^ { T } x _ { j } ( j = 1,2 , \\cdots , n )$的样本方差$a _ { 2 } ^ { T } S a _ { 2 }$最大的$x$的线性变换；\n",
    "\n",
    "一般地，样本第$i$主成分$y _ { i } = a _ { i } ^ { T } x$是在$a _ { i } ^ { T } a _ { i } = 1$和$a _ { i } ^ { T } x _ { j }$与$a _ { k } ^ { T } x _ { j } ( k < i , j = 1,2 , \\cdots , n )$的样本协方差$a _ { k } ^ { T } S a _ { i } = 0$条件下，使得$a _ { i } ^ { T } x _ { j } ( j = 1,2 , \\cdots , n )$的样本方差$a _ { k } ^ { T } S a _ { i }$最大的$x$的线性变换。\n",
    "\n",
    "5.主成分分析方法主要有两种，可以通过相关矩阵的特征值分解或样本矩阵的奇异值分解进行。\n",
    "\n",
    "（1）相关矩阵的特征值分解算法。针对$m \\times n$样本矩阵$X$，求样本相关矩阵\n",
    "$$R = \\frac { 1 } { n - 1 } X X ^ { T }$$\n",
    "再求样本相关矩阵的$k$个特征值和对应的单位特征向量，构造正交矩阵\n",
    "$$V = ( v _ { 1 } , v _ { 2 } , \\cdots , v _ { k } )$$\n",
    "\n",
    "$V$的每一列对应一个主成分，得到$k \\times n$样本主成分矩阵\n",
    "$$Y = V ^ { T } X$$\n",
    "\n",
    "（2）矩阵$X$的奇异值分解算法。针对$m \\times n$样本矩阵$X$ \n",
    "$$X ^ { \\prime } = \\frac { 1 } { \\sqrt { n - 1 } } X ^ { T }$$\n",
    "对矩阵$X ^ { \\prime }$进行截断奇异值分解，保留$k$个奇异值、奇异向量，得到\n",
    "$$X ^ { \\prime } = U S V ^ { T }$$\n",
    "$V$的每一列对应一个主成分，得到$k \\times n$样本主成分矩阵$Y$\n",
    "$$Y = V ^ { T } X$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本章代码直接使用Coursera机器学习课程的第六个编程练习。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----\n",
    "PCA（principal components analysis）即主成分分析技术旨在利用降维的思想，把多指标转化为少数几个综合指标。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sb\n",
    "from scipy.io import loadmat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = loadmat('data/ex7data1.mat')\n",
    "# data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAArkAAAHSCAYAAADohdOwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAbf0lEQVR4nO3dUWil6Xkf8P8bjUyO3RjRehosbZzd3OjGQy0zuJQB09pplBBjhqUXDqTQ9mJvSnAoKOyUQkkvOgOCkl4FlnWDS1yH1BnrwttaMUyNa2hsZq111MTWRYONfabpjikicXqoh+nbi5VmdsYa6RzpnPOd857fD4ad/STrPGsdOP/v/Z7nfUutNQAA0JKf6LoAAAAYNyEXAIDmCLkAADRHyAUAoDlCLgAAzRFyAQBozqVJ/ND3vOc99fnnn5/EjwYAgCTJ66+//oNa6+WTvjaRkPv888/n7t27k/jRAACQJCmlfPdZX9OuAABAc4RcAACaI+QCANAcIRcAgOYIuQAANEfIBQCgOUIuAADNEXIBAGiOkAsAQHOEXAAAmiPkAgDQHCEXAIDmCLkAADRHyAUAoDlCLgAAzbnUdQEAAM+ys9fP9u5B7h0OsrrSy9bmeq5vrHVdFnNAyAUAZtLOXj83bu9n8OBhkqR/OMiN2/tJIuhyJu0KAMBM2t49eBRwjw0ePMz27kFHFTFPhFwAYCbdOxyMdB3eTsgFAGbS6kpvpOvwdkIuADCTtjbX01teeuJab3kpW5vrHVXEPDF4BgDMpOPhMrsrcB5CLgAws65vrAm1nIt2BQAAmiPkAgDQHCEXAIDmCLkAADRHyAUAoDlCLgAAzRFyAQBojpALAEBzhFwAAJoj5AIA0BwhFwCA5gi5AAA0R8gFAKA5Qi4AAM0RcgEAaI6QCwBAc4RcAACaI+QCANAcIRcAgOYIuQAANEfIBQCgOUIuAADNEXIBAGjOmSG3lLJeSnnjbX/+opTy69MoDgAAzuPSWd9Qaz1I8oEkKaUsJekn+fyE6wIAgHMbtV3ho0n+R631u5MoBgAAxmHUkPuJJJ+dRCEAADAuQ4fcUso7knw8yX98xtdfKqXcLaXcvX///rjqAwCAkY2ykvtLSb5Ra/1fJ32x1vpKrfVqrfXq5cuXx1MdAACcwygh91eiVQEAgDkwVMgtpbwzyd9Pcnuy5QAAwMWduYVYktRa/0+SvzHhWgAAYCyceAYAQHOEXAAAmiPkAgDQHCEXAIDmCLkAADRHyAUAoDlCLgAAzRFyAQBojpALAEBzhFwAAJoj5AIA0BwhFwCA5gi5AAA0R8gFAKA5Qi4AAM0RcgEAaI6QCwBAc4RcAACaI+QCANAcIRcAgOZc6roAAADGZ2evn+3dg9w7HGR1pZetzfVc31ib29c5LyEXAKARO3v93Li9n8GDh0mS/uEgN27vJ8lYA+i0XucitCsAADRie/fgUfA8NnjwMNu7B3P5Ohch5AIANOLe4WCk67P+Ohch5AIANGJ1pTfS9Vl/nYsQcgEAGrG1uZ7e8tIT13rLS9naXJ/L17kIg2cAAI04Hvqa9K4H03qdiyi11rH/0KtXr9a7d++O/ecCAMCxUsrrtdarJ31NuwIAAM0RcgEAaI6QCwBAc4RcAACaI+QCANAcIRcAgOYIuQAANEfIBQCgOUIuAADNcawvACPZ2evP9FGeAImQC8AIdvb6uXF7P4MHD5Mk/cNBbtzeTxJBF5gp2hUAGNr27sGjgHts8OBhtncPOqoI4GRCLgBDu3c4GOk6QFeEXACGtrrSG+k6QFeEXACGtrW5nt7y0hPXestL2dpc76gigJMZPANgaMfDZXZXAGadkAvASK5vrAm1wMzTrgAAQHOEXAAAmiPkAgDQHCEXAIDmCLkAADRHyAUAoDlCLgAAzRFyAQBojsMgAADmxM5e34mDQxJyAQDmwM5ePzdu72fw4GGSpH84yI3b+0ki6J5AuwIAwBzY3j14FHCPDR48zPbuQUcVzTYhFwBgDtw7HIx0fdEJuQAAc2B1pTfS9UUn5ALAgtnZ6+farTt54eXXcu3Wnezs9bsuiSFsba6nt7z0xLXe8lK2Ntc7qmi2GTwDgAVieGl+Hf9+7K4wHCEXABbIacNLwtLsu76x5vc0JO0KALBADC+xKIRcAFgghpdYFEIuACwQw0ssCj25ALBADC+xKIRcAFgwhpdYBNoVAABojpALAEBzhFwAAJoj5AIA0ByDZwAAc2Jnr29njCEJuQAAc2Bnr58bt/cfHcvcPxzkxu39JBF0T6BdAQBgDmzvHjwKuMcGDx5me/ego4pmm5VcAGCmeUT/lnuHg5GuLzoruQDAzDp+RN8/HKTm8SP6nb1+16VN3epKb6Tri07IBQBmlkf0j21trqe3vPTEtd7yUrY21zuqaLZpVwAAZpZH9I8dt2ho3RiOkAsAzKzVlV76JwTaRX1Ef31jTagdknYFALiAnb1+rt26kxdefi3Xbt1ZyF7RSfKInvMaaiW3lLKS5NUk709Sk/yTWut/m2RhADDr7Fs6eR7Rc17Dtiv82yRfrLX+g1LKO5K8c4I1AcBcOG0oSggbH4/oOY8zQ24p5d1JPpzkHyVJrfVHSX402bIAYPYZioLZNUxP7s8luZ/kd0ope6WUV0sp75pwXQAw8+xbCrNrmJB7KckHk/x2rXUjyV8lefnpbyqlvFRKuVtKuXv//v0xlwkAs8dQFMyuYXpyv5/k+7XWrx39++dyQsittb6S5JUkuXr1ah1bhQAwowxFvcWxu8yiM0NurfXPSynfK6Ws11oPknw0yZ9OvjQAmH2LPhRlhwlm1bD75P5aks+UUv44yQeS/OvJlQQAzAvH7jKrhtpCrNb6RpKrE64FAJgzdphgVjnxDAA4NztMMKuEXADg3Owwwawa9sQzgIViWhyGY4cJZpWQC/AU0+LMknm44Vr0HSaYTdoVAJ5iWpxZcXzD1T8cpObxDdfOXr/r0mDmWckFeIppcaZhmBXa0264rJzC6azkAjzFtDiTNuwKrRsuOD8hF+AppsWZtGFbYtxwwfkJuQBPub6xlpsvXsnaSi8lydpKLzdfvOLxMGMz7AqtGy44Pz25ACcwLc4kra700j8h6D69Qmt7Ljg/IRcApmxrc/2JbeqSZ6/QuuGC8xFyAWDKrNDC5Am5ANABK7QwWQbPAABojpALAEBzhFwAAJoj5AIA0BwhFwCA5gi5AAA0R8gFAKA59skFAJiwnb2+wz+mTMgFAJignb3+E8c49w8HuXF7P0kE3QkScgGgAVYKZ9f27sGjgHts8OBhtncP/I4mSMgFgDlnpXC23TscjHSd8TB4BgBz7rSVQrq3utIb6TrjIeQCwAzY2evn2q07eeHl13Lt1p3s7PWH/t9aKZxtW5vr6S0vPXGtt7yUrc31jipaDNoVAKBjF203WF3ppX9CoLVSOBuOf4d6pqdLyAWAjl10MGlrc/2JkJxYKZw11zfWhNopE3IBoGMXbTewUgg/TsgFgI6No93ASiE8yeAZAHTMYBKMn5VcAOiYdgMYPyEXAGaAdgMYL+0KAAA0x0ouAABn2tnrz1VLjZALAMCpLnpgSRe0KwAAcKrTDiyZVUIuAACnuuiBJV0QcgEAONWzDiYZ5cCSaRNyAeAMO3v9XLt1Jy+8/Fqu3bqTnb1+1yXBVM3jgSUGzwDgFPM4cAPjNo8Hlgi5AHCK0wZuZvkDHsZt3g4s0a4AAKeYx4EbQMgFgFPN48ANIOQCMCGtDGvN48ANoCcXgAloaVhrmgM383ZsKswyIReAsWttWGsaAzct3RjALNCuAMDYGdYa3TwemwqzTMgFYOwMa43OjQGMl5ALwNgZ1hqdGwMYLyEXgLG7vrGWmy9eydpKLyXJ2kovN1+8orf0FG4MYLwMngEwEfN2OlLX5vHYVJhlQi4AzAg3BjA+2hUAAGiOkAsAQHOEXAAAmiPkAgDQHCEXAIDmCLkAADRHyAUAoDlCLgAAzRFyAQBojpALAEBzhFwAAJoj5AIA0BwhFwCA5gi5AAA0R8gFAKA5Qi4AAM0RcgEAaM6lrgsAgFmxs9fP9u5B7h0OsrrSy9bmeq5vrHVdFkxMy+95IReAoT3rA7GFD8qdvX5u3N7P4MHDJEn/cJAbt/eTZO7+W2AYrb/ntSsAMJTjD8T+4SA1jz8Q/8XO/onXd/b6XZc8ku3dg0cf9scGDx5me/ego4pgslp/zwu5AAzlWR+In/3a95r4oLx3OBjpOsy71t/z2hW4kBYeUQLDedYH38NaR/r+WbW60kv/hJpXV3odVAOT1/p73kou5/asR5fz9ogSGM6zPviWShnp+2fV1uZ6estLT1zrLS9la3O9o4pgslp/zwu5nFvrvTzAk571gfgrf/tnmvigvL6xlpsvXsnaSi8lydpKLzdfvOLpFM1q/T2vXYFza72XB3jS8QffSS1KV3/2rzfRunR9Y20u64bzavk9L+Rybq338gA/7lkfiC1/UALzaah2hVLKd0op+6WUN0opdyddFPOh9V4eAGB+jbKS+/dqrT+YWCXMndMeXQIAdEm7AhfiESUAMIuG3V2hJvnDUsrrpZSXJlkQAABc1LAruddqrfdKKX8zyZdKKd+utX7l7d9wFH5fSpL3ve99Yy4TAACGN1TIrbXeO/rnm6WUzyf5UJKvPPU9ryR5JUmuXr168vE3ADAhTmAE3u7MkFtKeVeSn6i1/uXR338hyb+aeGUAFyT0LI7jExiPD6g5PoExid85LKhhenJ/OslXSynfTPL1JK/VWr842bIALsax04vFCYzA085cya21/lmSvzWFWgDG5rTQY2WvPU5gBJ427O4KAHNF6Fkszzpp0QmMsLiEXKBJQs9icQIj8DQhF2iS0LNYrm+s5eaLV7K20ktJsrbSy80Xr2hNgQXmxDOgSY6dXjxOYATeTsgFmiX0ACwu7QoAADRHyAUAoDlCLgAAzdGTCzziGFwAWiHkAkkeH4N7fErY8TG4SQTdKXOzAXBx2hWAJKcfg8v0HN9s9A8HqXl8s7Gz1++6NIC5IuQCSRyDOyvcbACMh5ALJHEM7qxwswEwHkIukMQxuLPCzQbAeAi5QJK3hstuvnglayu9lCRrK73cfPGKgacpc7MBMB52VwAecQxu947//7e7AsDFCLkAM8bNBsDFaVcAAKA5Qi4AAM3RrgDAQnGiHCwGIReAheH4aobhRqgN2hUAWBhOlOMsjtZuh5VcABaGE+UW0ygrs6fdCFnNnS9CLgALY3Wll/4JgdaJcvPvWUF21BYVN0Lt0K4AwMJwolybTmsxGLVFxdHa7RBygWfa2evn2q07eeHl13Lt1h09acw9x1e36bQgO+rKrBuhdmhXAE5kCp1WOVGuPacF2VFbVByt3Q4hFziR4QtgXpwWZLc215+4YU/OXpl1I9QG7QrAiQxfAPPitBYDLSqLy0oucCJT6MC8OKvFwMrsYhJygROd5xFfy5yABLNNkOVpQi5wIsMXjxnCA5g/Qi7wTFZG3mIID2D+GDwDOIMhPID5I+QCnMEJSADzR8gFOIMTkEbjpDxgFujJBTiDIbzhGdIDZoWQCzAEQ3jDMaQHzArtCgCMjSE9YFYIuQCMjSE9YFYIuQCMjSE9YFboyQXmniN3Z4chPWBWCLnAXDPNP3sM6QGzQLsCMNdOm+YHYHFZyYU55RH9W0zzA3ASK7kwh44f0fcPB6l5/Ih+EU+WMs0PwEmEXJhDHtE/ZpofgJNoV4A55BH9Y6b5ATiJkAtzaHWll/4JgXZRH9Gb5gfgadoVYA55RA8Ap7OSC3PII3oAOJ2QC3PKI3oAeDbtCgAANEfIBQCgOUIuAADNEXIBAGiOkAsAQHOEXAAAmiPkAgDQHCEXAIDmCLkAADRHyAUAoDlCLgAAzRFyAQBozqWuCwBm185eP9u7B7l3OMjqSi9bm+u5vrHWdVkAcCYhFzjRzl4/N27vZ/DgYZKkfzjIjdv7SSLoAjDztCsAJ9rePXgUcI8NHjzM9u5BRxUBwPCEXOBE9w4HI10HgFki5AInWl3pjXQdAGaJkAucaGtzPb3lpSeu9ZaXsrW53lFFADA8g2fAiY6Hy+yuAMA8EnKBZ7q+sSbUAjCXtCsAANAcIRcAgOYIuQAANEfIBQCgOUIuAADNEXIBAGiOkAsAQHOGDrmllKVSyl4p5QuTLAgAAC5qlJXcTyb51qQKAQCAcRkq5JZSnkvyy0lenWw5AABwccMe6/tbSX4jyU9NsBZgwezs9bO9e5B7h4OsrvSytbnuGGEAxuLMldxSyseSvFlrff2M73uplHK3lHL3/v37YysQaNPOXj83bu+nfzhITdI/HOTG7f3s7PW7Lg2ABgzTrnAtycdLKd9J8ntJPlJK+d2nv6nW+kqt9Wqt9erly5fHXCbQmu3dgwwePHzi2uDBw2zvHnRUEQAtOTPk1lpv1Fqfq7U+n+QTSe7UWn914pUBTbt3OBjpOgCMwj65QCdWV3ojXQeAUYwUcmutX661fmxSxQCLY2tzPb3lpSeu9ZaXsrW53lFFALRk2N0VAMbqeBcFuysAMAlCLtCZ6xtrQi0AEyHkQofsEwsAkyHkQkeO94k93kbreJ/YJIIuAFyQkAsdOW2fWCH3MavdAJyHkAsdsU/s2ax2A3Be9smFjtgn9mxORQPgvIRc6Ih9Ys9mtRuA8xJyoSPXN9Zy88UrWVvppSRZW+nl5otXPIZ/G6vdAJyXnlzokH1iT7e1uf5ET25itRuA4Qi5wMxyKhoA5yXkAjPNajcA5yHkAj/G3rQAzDshF3iCvWkBaIHdFYAn2JsWgBYIucAT7E0LQAu0K0ADxtlDu7rSS/+EQGtvWgDmiZVcmHPHPbT9w0FqHvfQ7uz1n/n9127dyQsvv5Zrt+782Pc5iQ2AFgi5MOdG6aEdJhA7iQ2AFmhXgDk3Sg/taYH47SHW3rQAzDsruTDnntUre9J1Q2UALAohF+bcKD20owRiAJhnQi7MuVF6aA2VAbAo9ORCA4btoT3+Hkf2AtA6IRcWjKGy6Rnn/sUAjEbIBZiA4+3ajnezON6uLYmgCzAFenIBJmCU/YsBGD8hF2ACbNcG0C0hF2ACbNcG0C0hF2ACbNcG0C2DZwATYLs2gG4JuQATYrs2gO5oVwAAoDlCLgAAzRFyAQBojpALAEBzhFwAAJoj5AIA0BwhFwCA5gi5AAA0R8gFAKA5Qi4AAM0RcgEAaI6QCwBAc4RcAACaI+QCANCcS10XADAvdvb62d49yL3DQVZXetnaXM/1jbWuywLgBEIuwBB29vq5cXs/gwcPkyT9w0Fu3N5PEkEXYAZpVwAYwvbuwaOAe2zw4GG2dw86qgiA0wi5AEO4dzgY6ToA3RJyAYawutIb6ToA3RJyAYawtbme3vLSE9d6y0vZ2lzvqCIATmPwDGAIx8NldlcAmA9CLjARLW63dX1jbe7/GwAWhZALjJ3ttgDomp5cYOxstwVA14RcYOxstwVA14RcYOxstwVA14RcsrPXz7Vbd/LCy6/l2q072dnrd10Sc852WwB0zeDZgjMgxCTYbguArgm5C+60ASGBhIuw3RYAXdKusOAMCAEALRJyF5wBIQCgRULugjMgBAC0SE/ugjMgBAC0SMjFgBAA0BztCgAANEfIBQCgOUIuAADNEXIBAGiOkAsAQHOEXAAAmiPkAgDQHCEXAIDmCLkAADRHyAUAoDlCLgAAzRFyAQBojpALAEBzzgy5pZSfLKV8vZTyzVLKn5RSfnMahQEAwHldGuJ7/m+Sj9Raf1hKWU7y1VLKf661/tGEawMAgHM5M+TWWmuSHx796/LRnzrJogAA4CKG6sktpSyVUt5I8maSL9VavzbZsgAA4PyGCrm11oe11g8keS7Jh0op73/6e0opL5VS7pZS7t6/f3/cdQIAwNBG2l2h1nqY5MtJfvGEr71Sa71aa716+fLlMZUHAACjG2Z3hcullJWjv/eS/HySb0+6MAAAOK9hdld4b5JPl1KW8lYo/v1a6xcmWxYAAJzfMLsr/HGSjSnUAgAAY+HEMwAAmiPkAgDQnGF6cufCzl4/27sHuXc4yOpKL1ub67m+sdZ1WQAAdKCJkLuz18+N2/sZPHiYJOkfDnLj9n6SCLoAAAuoiXaF7d2DRwH32ODBw2zvHnRUEQAAXWoi5N47HIx0HQCAtjURcldXeiNdBwCgbU2E3K3N9fSWl5641lteytbmekcVAQDQpSYGz46Hy+yuAABA0kjITd4KukItAABJI+0KAADwdkIuAADNEXIBAGiOkAsAQHOEXAAAmiPkAgDQHCEXAIDmCLkAADRHyAUAoDlCLgAAzRFyAQBojpALAEBzhFwAAJoj5AIA0BwhFwCA5gi5AAA0p9Rax/9DS7mf5LvP+PJ7kvxg7C/KPPEeWGx+/3gPLDa/f8b5HvjZWuvlk74wkZB7mlLK3Vrr1am+KDPFe2Cx+f3jPbDY/P6Z1ntAuwIAAM0RcgEAaE4XIfeVDl6T2eI9sNj8/vEeWGx+/0zlPTD1nlwAAJg07QoAADRnaiG3lPLvSilvllL++7Rek9lRSvmZUsp/KaV8q5TyJ6WUT3ZdE9NVSvnJUsrXSynfPHoP/GbXNTF9pZSlUspeKeULXdfC9JVSvlNK2S+lvFFKudt1PUxXKWWllPK5Usq3j/LA35no602rXaGU8uEkP0zy72ut75/KizIzSinvTfLeWus3Sik/leT1JNdrrX/acWlMSSmlJHlXrfWHpZTlJF9N8sla6x91XBpTVEr5Z0muJnl3rfVjXdfDdJVSvpPkaq3VPrkLqJTy6ST/tdb6ainlHUneWWs9nNTrTW0lt9b6lST/e1qvx2yptf7PWus3jv7+l0m+lWSt26qYpvqWHx796/LRH0MBC6SU8lySX07yate1ANNVSnl3kg8n+VSS1Fp/NMmAm+jJpQOllOeTbCT5WreVMG1Hj6rfSPJmki/VWr0HFstvJfmNJP+v60LoTE3yh6WU10spL3VdDFP1c0nuJ/mdo5alV0sp75rkCwq5TFUp5a8l+YMkv15r/Yuu62G6aq0Pa60fSPJckg+VUrQuLYhSyseSvFlrfb3rWujUtVrrB5P8UpJ/etTKyGK4lOSDSX671rqR5K+SvDzJFxRymZqjPsw/SPKZWuvtruuhO0ePqL6c5Bc7LoXpuZbk40c9mb+X5COllN/ttiSmrdZ67+ifbyb5fJIPdVsRU/T9JN9/2xO8z+Wt0DsxQi5TcTR09Kkk36q1/puu62H6SimXSykrR3/vJfn5JN/utiqmpdZ6o9b6XK31+SSfSHKn1vqrHZfFFJVS3nU0eJyjx9S/kMSOSwui1vrnSb5XSlk/uvTRJBMdPr80yR/+dqWUzyb5u0neU0r5fpJ/WWv91LRen85dS/IPk+wf9WQmyT+vtf6nDmtiut6b5NOllKW8dYP9+7VW20jB4vjpJJ9/a80jl5L8h1rrF7stiSn7tSSfOdpZ4c+S/ONJvpgTzwAAaI52BQAAmiPkAgDQHCEXAIDmCLkAADRHyAUAoDlCLgAAzRFyAQBojpALAEBz/j9McNcMFNHIXgAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 864x576 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "X = data['X']\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(12,8))\n",
    "ax.scatter(X[:, 0], X[:, 1])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PCA的算法相当简单。 在确保数据被归一化之后，输出仅仅是原始数据的协方差矩阵的奇异值分解。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def pca(X):\n",
    "    # normalize the features\n",
    "    X = (X - X.mean()) / X.std()\n",
    "    \n",
    "    # compute the covariance matrix\n",
    "    X = np.matrix(X)\n",
    "    cov = (X.T * X) / X.shape[0]\n",
    "    \n",
    "    # perform SVD\n",
    "    U, S, V = np.linalg.svd(cov)\n",
    "    \n",
    "    return U, S, V"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(matrix([[-0.79241747, -0.60997914],\n",
       "         [-0.60997914,  0.79241747]]),\n",
       " array([1.43584536, 0.56415464]),\n",
       " matrix([[-0.79241747, -0.60997914],\n",
       "         [-0.60997914,  0.79241747]]))"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "U, S, V = pca(X)\n",
    "U, S, V"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在我们有主成分（矩阵U），我们可以用这些来将原始数据投影到一个较低维的空间中。 对于这个任务，我们将实现一个计算投影并且仅选择顶部K个分量的函数，有效地减少了维数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def project_data(X, U, k):\n",
    "    U_reduced = U[:,:k]\n",
    "    return np.dot(X, U_reduced)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matrix([[-4.74689738],\n",
       "        [-7.15889408],\n",
       "        [-4.79563345],\n",
       "        [-4.45754509],\n",
       "        [-4.80263579],\n",
       "        [-7.04081342],\n",
       "        [-4.97025076],\n",
       "        [-8.75934561],\n",
       "        [-6.2232703 ],\n",
       "        [-7.04497331],\n",
       "        [-6.91702866],\n",
       "        [-6.79543508],\n",
       "        [-6.3438312 ],\n",
       "        [-6.99891495],\n",
       "        [-4.54558119],\n",
       "        [-8.31574426],\n",
       "        [-7.16920841],\n",
       "        [-5.08083842],\n",
       "        [-8.54077427],\n",
       "        [-6.94102769],\n",
       "        [-8.5978815 ],\n",
       "        [-5.76620067],\n",
       "        [-8.2020797 ],\n",
       "        [-6.23890078],\n",
       "        [-4.37943868],\n",
       "        [-5.56947441],\n",
       "        [-7.53865023],\n",
       "        [-7.70645413],\n",
       "        [-5.17158343],\n",
       "        [-6.19268884],\n",
       "        [-6.24385246],\n",
       "        [-8.02715303],\n",
       "        [-4.81235176],\n",
       "        [-7.07993347],\n",
       "        [-5.45953289],\n",
       "        [-7.60014707],\n",
       "        [-4.39612191],\n",
       "        [-7.82288033],\n",
       "        [-3.40498213],\n",
       "        [-6.54290343],\n",
       "        [-7.17879573],\n",
       "        [-5.22572421],\n",
       "        [-4.83081168],\n",
       "        [-7.23907851],\n",
       "        [-4.36164051],\n",
       "        [-6.44590096],\n",
       "        [-2.69118076],\n",
       "        [-4.61386195],\n",
       "        [-5.88236227],\n",
       "        [-7.76732508]])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Z = project_data(X, U, 1)\n",
    "Z"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们也可以通过反向转换步骤来恢复原始数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def recover_data(Z, U, k):\n",
    "    U_reduced = U[:,:k]\n",
    "    return np.dot(Z, U_reduced.T)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matrix([[3.76152442, 2.89550838],\n",
       "        [5.67283275, 4.36677606],\n",
       "        [3.80014373, 2.92523637],\n",
       "        [3.53223661, 2.71900952],\n",
       "        [3.80569251, 2.92950765],\n",
       "        [5.57926356, 4.29474931],\n",
       "        [3.93851354, 3.03174929],\n",
       "        [6.94105849, 5.3430181 ],\n",
       "        [4.93142811, 3.79606507],\n",
       "        [5.58255993, 4.29728676],\n",
       "        [5.48117436, 4.21924319],\n",
       "        [5.38482148, 4.14507365],\n",
       "        [5.02696267, 3.8696047 ],\n",
       "        [5.54606249, 4.26919213],\n",
       "        [3.60199795, 2.77270971],\n",
       "        [6.58954104, 5.07243054],\n",
       "        [5.681006  , 4.37306758],\n",
       "        [4.02614513, 3.09920545],\n",
       "        [6.76785875, 5.20969415],\n",
       "        [5.50019161, 4.2338821 ],\n",
       "        [6.81311151, 5.24452836],\n",
       "        [4.56923815, 3.51726213],\n",
       "        [6.49947125, 5.00309752],\n",
       "        [4.94381398, 3.80559934],\n",
       "        [3.47034372, 2.67136624],\n",
       "        [4.41334883, 3.39726321],\n",
       "        [5.97375815, 4.59841938],\n",
       "        [6.10672889, 4.70077626],\n",
       "        [4.09805306, 3.15455801],\n",
       "        [4.90719483, 3.77741101],\n",
       "        [4.94773778, 3.80861976],\n",
       "        [6.36085631, 4.8963959 ],\n",
       "        [3.81339161, 2.93543419],\n",
       "        [5.61026298, 4.31861173],\n",
       "        [4.32622924, 3.33020118],\n",
       "        [6.02248932, 4.63593118],\n",
       "        [3.48356381, 2.68154267],\n",
       "        [6.19898705, 4.77179382],\n",
       "        [2.69816733, 2.07696807],\n",
       "        [5.18471099, 3.99103461],\n",
       "        [5.68860316, 4.37891565],\n",
       "        [4.14095516, 3.18758276],\n",
       "        [3.82801958, 2.94669436],\n",
       "        [5.73637229, 4.41568689],\n",
       "        [3.45624014, 2.66050973],\n",
       "        [5.10784454, 3.93186513],\n",
       "        [2.13253865, 1.64156413],\n",
       "        [3.65610482, 2.81435955],\n",
       "        [4.66128664, 3.58811828],\n",
       "        [6.1549641 , 4.73790627]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_recovered = recover_data(Z, U, 1)\n",
    "X_recovered"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAsIAAAHSCAYAAADmLK3fAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3dfYyl51kf4N/NeCgnITCo2YJ3bONWoK0U3HhhlARZqsBQNoTUrEwkjAptUCu3qB+hrRZlEeIjqmTQSDRtIxG5SdtAwmeymZo0sKRKEBA1RrMeh41xRkrblHg2rZeESeJmRMebp3/szLI7ntn52PP9Xpc02nPe8845t3SU5Jdn7ud+qrUWAADomi8bdQEAADAKgjAAAJ0kCAMA0EmCMAAAnSQIAwDQSYIwAACddNtBbqqqTyb5QpIrSZ5vrS3seP3bkvznJP9z69K51tqbbvaeL33pS9vdd999yHIBAOBwLly48KettWM7rx8oCG/59tban97k9d9vrb32oG929913Z3l5+RAfDwAAh1dV/2u361ojAADopIMG4Zbkd6rqQlU9vMc931pVH62q36qql/WpPgAAGIiDtkbc11q7VFV/JckHqurjrbXfu+71J5J8fWvtuap6TZKlJN+48022QvTDSXLXXXfdYukAAHB0B1oRbq1d2vr32STvTfKKHa9/vrX23Nbj9yeZraqX7vI+j7bWFlprC8eOvaBfGQAAhmbfIFxVL66ql2w/TvJdST62456vq6raevyKrff9TP/LBQCA/jhIa8TXJnnvVs69Lckvt9Z+u6r+UZK01t6a5HVJfqSqnk+ykeSh1lobUM0AAHDL9g3CrbX/keTlu1x/63WP35LkLf0tDQAABsf4NAAAOkkQBgCgkwRhAAA6SRAGAKCTBGEAADpJEAYAoJMEYQAAOkkQBgCgkw5yshwAABzJ0spaFs+v5tL6Ro7P9XLm1ImcPjk/6rKSCMIAAAzI0spazp67mI3NK0mStfWNnD13MUnGIgxrjQAAYCAWz69eC8HbNjavZPH86ogqupEgDADAQFxa3zjU9WEThAEAGIjjc71DXR82QRgAgIE4c+pEerMzN1zrzc7kzKkTI6roRjbLAQAwENsb4kyNAACgc06fnB+b4LuT1ggAADpJEAYAoJMEYQAAOkmPMAAAexrnI5JvlSAMAMCuxv2I5FulNQIAgF2N+xHJt0oQBgBgV+N+RPKt0hoBAMCuvcDH53pZ2yX0jssRybfKijAAQMdt9wKvrW+k5S96gb/9rx8b6yOSb5UgDADQcXv1An/o45fzyIP3ZH6ul0oyP9fLIw/eMxUb5RKtEQAAnXezXuBxPiL5VlkRBgDouL16fqelF3gvgjAAQMedOXViqnuB96I1AgCg47ZbH6b1BLm9CMIAAEx1L/BetEYAANBJgjAAAJ0kCAMA0EmCMAAAnSQIAwDQSYIwAACdZHwaAMCEWFpZ69ys30EShAEAJsDSylrOnruYjc0rSZK19Y2cPXcxSYThI9IaAQAwARbPr14Lwds2Nq9k8fzqiCqafIIwAMAEuLS+cajr7E8QBgCYAMfneoe6zv4EYQCACXDm1In0ZmduuNabncmZUydGVNHks1kOAGACbG+IMzWifw4UhKvqk0m+kORKkudbaws7Xq8k/ybJa5J8McnrW2tP9LdUAIBuO31yXvDto8OsCH97a+1P93jtu5N849bPK5P8wta/AADchNnAo9Ov1ojvTfKLrbWW5CNVNVdVt7fWPt2n9wcAmDpmA4/WQTfLtSS/U1UXqurhXV6fT/Kp654/s3UNAIA9mA08WgddEb6vtXapqv5Kkg9U1cdba7933eu1y++0nRe2QvTDSXLXXXcdulgAgGliNvBoHWhFuLV2aevfZ5O8N8krdtzyTJI7r3t+R5JLu7zPo621hdbawrFjx45WMQDAlDAbeLT2DcJV9eKqesn24yTfleRjO257LMnfrateleRz+oMBAG7ObODROkhrxNcmee/VCWm5Lckvt9Z+u6r+UZK01t6a5P25OjrtE7k6Pu2HB1MuAMD0MBt4tOrqoIfhW1hYaMvLyyP5bAAAuqOqLuw8ByNxxDIAAB0lCAMA0EmCMAAAndSvk+UAADrF0ciTTxAGADgkRyNPB60RAACH5Gjk6SAIAwAckqORp4MgDABwSI5Gng6CMADAITkaeTrYLAcAcEiORp4OgjAAwBGcPjkv+E44rREAAHSSIAwAQCcJwgAAdJIgDABAJwnCAAB0kqkRAEAnLK2sGXfGDQRhAGDqLa2s5ey5i9nYvJIkWVvfyNlzF5NEGO4wrREAwNRbPL96LQRv29i8ksXzqyOqiHEgCAMAU+/S+sahrtMNWiMAgKmyWy/w8ble1nYJvcfneiOokHFhRRgAmBrbvcBr6xtp+Yte4G//68fSm5254d7e7EzOnDoxmkIZC4IwADA19uoF/tDHL+eRB+/J/FwvlWR+rpdHHrzHRrmO0xoBAEyNm/UCnz45L/hyA0EYAJhIeoG5VVojAICJoxeYfhCEAYCJoxeYftAaAQBMHL3A9IMVYQBg4uzV86sXmMMQhAGAiXPm1Am9wNwyrREAwMTZbn3YOTVCSwSHIQgDAGNlt7FouwVcvcDcKkEYABgb22PRtidCbI9FSyL00nd6hAGAsbHXWLTF86sjqohpJggDAGPjZmPRoN+0RgAAI+GIZEbNijAAMHSOSGYcCMIAwNA5IplxoDUCABg6RyQzDqwIAwBD54hkxoEgDAAMnSOSGQdaIwCAoXNEMuNAEAYARkIvMKOmNQIAgE4ShAEA6KQDB+Gqmqmqlap63y6vvb6qLlfVk1s//6C/ZQIAQH8dpkf4DUmeTvJVe7z+a621f3LrJQEAwOAdaEW4qu5I8j1J3jbYcgAAYDgO2hrx5iQ/luRLN7nn+6rqj6rq3VV15243VNXDVbVcVcuXL18+bK0AANA3+wbhqnptkmdbaxducttvJrm7tfY3kvzXJO/Y7abW2qOttYXW2sKxY8eOVDAAAPTDQVaE70vyQFV9MsmvJrm/qt55/Q2ttc+01v586+m/T/Itfa0SAAD6bN8g3Fo721q7o7V2d5KHknywtfaD199TVbdf9/SBXN1UBwAAY+vIJ8tV1ZuSLLfWHkvyz6rqgSTPJ/lsktf3pzwAABiMaq2N5IMXFhba8vLySD4bAIDuqKoLrbWFndePvCIMAEyHpZW1LJ5fzaX1jRyf6+XMqRM5fXJ+1GXBwAnCANBhSytrOXvuYjY2ryRJ1tY3cvbcxSQRhpl6Bz5iGQCYPovnV6+F4G0bm1eyeH51RBXB8AjCANBhl9Y3DnUdponWCADoiN16gY/P9bK2S+g9PtcbQYUwXIIwAEy5q33Af5SNzS9du7bdC/x93zKf91xYu6E9ojc7kzOnToyiVBgqrREAMMWWVtZy5jc+ekMI3raxeSUf+vjlPPLgPZmf66WSzM/18siD99goRydYEQaAKbZ4fjWbX9r7zIBL6xs5fXJe8KWTrAgDwBTbb9ObXmC6TBAGgCl2s6BbiV5gOk0QBoAJt7Sylvt+9oP5q2/8L7nvZz+YpZW1a6+dOXUis19Wu/7e33nVXVoi6DQ9wgAwwfY7GW476P70Y09lfWMzSfI1L5rNT/3tlwnBdJ4gDAAT7GYnw20HXZvhYHeCMABMmJ9YuphfefxTudJuPg0CuDlBGAAmyE8sXcw7P/In+95nGgTsz2Y5AJggv/L4p/a9x8lwcDBWhAFgzC2trGXx/GourW9k72aIq+PQjs/1cubUCT3BcACCMACMsZ1TIfYyU5X//shrhlQVTAetEQAwxnabCrGbH3jlnUOoBqaLFWEAGBPXt0BstzjsN/1hpio/8Mo7869O3zOkKmF6CMIAMAb2Ohjjq3uz1w7CuN78XC8ffuP9wy4TpoogDAAjtL0KvLbLyu/G5pV8xeyXpTc7c0N7hKkQ0B96hAFgRLZXgXcLwdvWv7iZRx68J/NzvVSurgQ/8uA9pkJAH1gRBoAROchGuONzPUckw4BYEQaAEdlvI5wWCBgsQRgARuRmxyBrgYDBE4QBYETOnDqR3uzMDdd6szN58/ffmw+/8X4hGAZMjzAAjMh20N05O1gAhuEQhAFghGyEg9ERhAGgj3Y7HU7QhfEkCANAn+x1OlwSYRjGkM1yANAnu80F3ti8ksXzqyOqCLgZQRgA+mSvucD7zQsGRkMQBoA+2Wsu8M3mBQOjIwgDQJ/sNRfY6XAwnmyWA4A+MRcYJosgDAB9ZC4wTA6tEQAAdJIgDABAJwnCAAB0kiAMAEAnCcIAAHSSIAwAQCcJwgAAdNKBg3BVzVTVSlW9b5fX/lJV/VpVfaKqHq+qu/tZJAAA9NthDtR4Q5Knk3zVLq/9/SR/1lr7hqp6KMnPJfn+PtQHAH2ztLLm1DfgmgOtCFfVHUm+J8nb9rjle5O8Y+vxu5N8R1XVrZcHAP2xtLKWs+cuZm19Iy3J2vpGzp67mKWVtVGXBozIQVsj3pzkx5J8aY/X55N8Kklaa88n+VySv7zzpqp6uKqWq2r58uXLRygXAI5m8fxqNjav3HBtY/NKFs+vjqgiYNT2DcJV9dokz7bWLtzstl2utRdcaO3R1tpCa23h2LFjhygTAG7NpfWNQ10Hpt9BVoTvS/JAVX0yya8mub+q3rnjnmeS3JkkVXVbkq9O8tk+1gkAt+T4XO9Q14Hpt28Qbq2dba3d0Vq7O8lDST7YWvvBHbc9luTvbT1+3dY9L1gRBoBROXPqRHqzMzdc683O5MypEyOqCBi1w0yNuEFVvSnJcmvtsSRvT/JLVfWJXF0JfqhP9QFAX2xPhzA1AthWo1q4XVhYaMvLyyP5bACmi7FowM1U1YXW2sLO60deEQaAcbA9Fm17IsT2WLQkwjBwU45YBmCiGYsGHJUgDMBEMxYNOCpBGICJZiwacFSCMAATzVg04KhslgNgbB1kGoSxaMBRCcIAjKXDTIM4fXJe8AUOTWsEAGPJNAhg0ARhAMaSaRDAoAnCAIwl0yCAQROEARhLpkEAg2azHABjyTQIYNAEYQCGamllLT/92FNZ39hMknzNi2bzU3/7ZbsGXNMggEEShAEYmp9Yuph3fuRPbrj2Z1/czJl3fzTJC8eiAQySHmEAhmJpZS3v2hGCt21eacaiAUNnRRiAgdo+HW5tn7FnxqIBwyYIAzAwO0+Huxlj0YBh0xoBwMDsdjrcbmZnylg0YOgEYQAG5iDtDi/+8pksvu7lNsoBQ6c1AoCBOT7X27M3eN5cYGDErAgDMDB7nQ735u+/Nx9+4/1CMDBSVoQBGBinwwHjTBAGYKCcDgeMK60RAAB0kiAMAEAnCcIAAHSSIAwAQCcJwgAAdJIgDABAJwnCAAB0kiAMAEAnCcIAAHSSIAwAQCcJwgAAdJIgDABAJ9026gIAGLyllbUsnl/NpfWNHJ/r5cypEzl9cn7UZQGMlCAMMOWWVtZy9tzFbGxeSZKsrW/k7LmLSSIMA52mNQJgyi2eX70WgrdtbF7J4vnVEVUEMB4EYYApd2l941DXAbpCEAaYcsfneoe6DtAVgjDAlDtz6kR6szM3XOvNzuTMqRMjqghgPNgsBzDltjfEmRoBcCNBGKADTp+cF3wBdhCEASaQucAAt27fHuGq+oqq+sOq+mhVPVVVP7PLPa+vqstV9eTWzz8YTLkAbM8FXlvfSMtfzAVeWlkbdWkAE+Ugm+X+PMn9rbWXJ7k3yaur6lW73PdrrbV7t37e1tcqAbjGXGCA/ti3NaK11pI8t/V0duunDbIoAPZmLjBAfxxofFpVzVTVk0meTfKB1trju9z2fVX1R1X17qq6s69VAnCNucAA/XGgINxau9JauzfJHUleUVXftOOW30xyd2vtbyT5r0nesdv7VNXDVbVcVcuXL1++lboBOstcYID+ONSBGq219SS/m+TVO65/prX251tP/32Sb9nj9x9trS201haOHTt2hHIBOH1yPo88eE/m53qpJPNzvTzy4D2mRgAc0r49wlV1LMlma229qnpJvjPJz+245/bW2qe3nj6Q5Om+VwrANeYCA9y6g8wRvj3JO6pqJldXkH+9tfa+qnpTkuXW2mNJ/llVPZDk+SSfTfL6QRUMAAD9UFeHQgzfwsJCW15eHslnAwDQHVV1obW2sPP6oXqEAQBgWgjCAAB00kF6hAG4RUsra1k8v5pL6xs5PtfLmVMnbHYDGDFBGGDAllbWcvbcxWvHIq+tb+TsuYtJIgwDjJDWCIABWzy/ei0Eb9vYvJLF86sjqgiARBAGGLhL6xuHug7AcAjCAAN2fK53qOsADIcgDDBgZ06dSG925oZrvdmZnDl1YkQVAZDYLAcwcNsb4kyNABgvgjDAEJw+OS/4AowZQRjgiMwGBphsgjDAEZgNDDD5bJYDOAKzgQEmnyAMcARmAwNMPkEY4AjMBgaYfIIwwBGYDQww+WyWAzgCs4EBJp8gDHBEZgMDTDatEQAAdJIgDABAJwnCAAB0kiAMAEAnCcIAAHSSIAwAQCcJwgAAdJI5wsBUW1pZc+gFALsShIGptbSylrPnLmZj80qSZG19I2fPXUwSYRgArRHA9Fo8v3otBG/b2LySxfOrI6oIgHFiRRiYGjvbINbWN3a979Ie1wHoFkEYmAq7tUFUkrbLvcfnekOtDYDxpDUCmAq7tUG0JLXjvt7sTM6cOjG0ugAYX4IwMBX2andoSebneqmtfx958B4b5QBIojUCmBJ79QTPz/Xy4TfeP4KKABh3VoSBqXDm1In0ZmduuKYNAoCbsSIMjL2DHIqx/dzhGQAclCAMjLXDHIpx+uS84AvAgWmNAMaaQzEAGBRBGBhre02DcCgGALdKEAbG2l6HXzgUA4BbJQgDY800CAAGxWY5YKyZBgHAoAjCwEgcZCTaNtMgABgEQRgYusOMRAOAQdEjDAydkWgAjIN9g3BVfUVV/WFVfbSqnqqqn9nlnr9UVb9WVZ+oqser6u5BFAtMByPRABgHB1kR/vMk97fWXp7k3iSvrqpX7bjn7yf5s9baNyT510l+rr9lAtPESDQAxsG+Qbhd9dzW09mtn7bjtu9N8o6tx+9O8h1VVX2rEpgqRqIBMA4O1CNcVTNV9WSSZ5N8oLX2+I5b5pN8Kklaa88n+VySv9zPQoHpcfrkfB558J7Mz/VSSebnennkwXtslANgqA40NaK1diXJvVU1l+S9VfVNrbWPXXfLbqu/O1eNU1UPJ3k4Se66664jlAtMCyPRABi1Q02NaK2tJ/ndJK/e8dIzSe5Mkqq6LclXJ/nsLr//aGttobW2cOzYsSMVDAAA/bDvinBVHUuy2Vpbr6peku/MCzfDPZbk7yX5b0lel+SDrbUXrAgD0+cwB2MAwDg5SGvE7UneUVUzubqC/OuttfdV1ZuSLLfWHkvy9iS/VFWfyNWV4IcGVjEwNhyMAcAkq1Et3C4sLLTl5eWRfDZw65ZW1vLPf/3J7PZfIfNzvXz4jfcPvygA2EVVXWitLey87ohl4NB+Yuli3vmRP9nzdQdjADAJHLEMHMrSylredZMQnDgYA4DJIAgDh7J4fvWFsxF3cDAGAJNAawRwUzunQqzt0/Yw15u1UQ6AiSAIA3vabSpEZZfTcrbMzlR++oGXDa0+ALgVWiOAPS2eX70Wgre17H6U5Iu/fCaLr3u51WAAJoYVYWBPe01/aLk6Is0hGgBMMkEY2NNePcHmBAMwDbRGAHs6c+pEerMzN1zrzc6YCgHAVLAiDOxpu93h+qkR2iAAmBaCMHBTp0/OC74ATCVBGDpk50xgq7sAdJkgDB2x20zgs+cuJokwDEAn2SwHHbHbTOCNzStZPL86oooAYLQEYeiIvWYC73UdAKadIAwdcXyud6jrADDtBGGYMksra7nvZz+Yv/rG/5L7fvaDWVpZS2ImMADsZLMcTJGDbIgzNQIArhKEYYrcbEPc9jxgwRcArtIaAVPEhjgAODhBGKaIDXEAcHCCMEwRG+IA4OD0CMME2e+IZBviAODgBGGYEAc9ItmGOAA4GK0RMCEckQwA/SUIw4QwEQIA+ktrBIyh3XqBj8/1srZL6DURAgCORhCGMbK0spaffuyprG9sXru23Qv8fd8yn/dcWLuhPcJECAA4OkEYxsDSylp+5jefyp99cXPX1zc2r+RDH7+cRx68x0QIAOgTQRhGbOc0iL1cWt8wEQIA+shmORix3aZB7EYvMAD0lyAMI3aQqQ96gQGg/wRhGLH9Vnq/5kWzeeTBe7REAECfCcIwYmdOnUhvduYF1+d6s3nz99+blZ/8LiEYAAbAZjkYse2QaxoEAAyXIAxjwDQIABg+rREAAHSSIAwAQCcJwgAAdJIgDABAJwnCAAB0kiAMAEAnCcIAAHSSIAwAQCftG4Sr6s6q+lBVPV1VT1XVG3a559uq6nNV9eTWz08OplwAAOiPg5ws93ySf9lae6KqXpLkQlV9oLX2xzvu+/3W2mv7XyIAAPTfvivCrbVPt9ae2Hr8hSRPJ3EWLAAAE+1QPcJVdXeSk0ke3+Xlb62qj1bVb1XVy/pQGwAADMxBWiOSJFX1lUnek+RHW2uf3/HyE0m+vrX2XFW9JslSkm/c5T0eTvJwktx1111HLhoAAG7VgVaEq2o2V0Pwu1pr53a+3lr7fGvtua3H708yW1Uv3eW+R1trC621hWPHjt1i6QAAcHQHmRpRSd6e5OnW2s/vcc/Xbd2XqnrF1vt+pp+FAgBAPx2kNeK+JD+U5GJVPbl17ceT3JUkrbW3Jnldkh+pqueTbCR5qLXWBlAvAAD0xb5BuLX2B0lqn3vekuQt/SoKAAAGzclyAAB0kiAMAEAnCcIAAHSSIAwAQCcJwgAAdJIgDABAJwnCAAB0kiAMAEAnCcIAAHSSIAwAQCcJwgAAdJIgDABAJwnCAAB0kiAMAEAnCcIAAHSSIAwAQCcJwgAAdJIgDABAJwnCAAB0kiAMAEAnCcIAAHSSIAwAQCcJwgAAdJIgDABAJwnCAAB0kiAMAEAnCcIAAHSSIAwAQCcJwgAAdJIgDABAJwnCAAB0kiAMAEAnCcIAAHSSIAwAQCcJwgAAdJIgDABAJwnCAAB00m2jLoDxsbSylsXzq7m0vpHjc72cOXUip0/Oj7osAICBEIRJcjUEnz13MRubV5Ika+sbOXvuYpIIwwDAVNIaQZJk8fzqtRC8bWPzShbPr46oIgCAwRKESZJcWt841HUAgEknCJMkOT7XO9R1AIBJJwiTJDlz6kR6szM3XOvNzuTMqRMjqggAYLBsliPJX2yIMzUCAOgKQZhrTp+cF3wBgM7YtzWiqu6sqg9V1dNV9VRVvWGXe6qq/m1VfaKq/qiqvnkw5QIAQH8cZEX4+ST/srX2RFW9JMmFqvpAa+2Pr7vnu5N849bPK5P8wta/AAAwlvZdEW6tfbq19sTW4y8keTrJzr+ff2+SX2xXfSTJXFXd3vdqAQCgTw41NaKq7k5yMsnjO16aT/Kp654/kxeG5VTVw1W1XFXLly9fPlylAADQRwcOwlX1lUnek+RHW2uf3/nyLr/SXnChtUdbawuttYVjx44drlIAAOijAwXhqprN1RD8rtbauV1ueSbJndc9vyPJpVsvDwAABuMgUyMqyduTPN1a+/k9bnssyd/dmh7xqiSfa619uo91AgBAXx1kasR9SX4oycWqenLr2o8nuStJWmtvTfL+JK9J8okkX0zyw/0vFQAA+mffINxa+4Ps3gN8/T0tyT/uV1EAADBoh5oaAQAA00IQBgCgkwRhAAA6SRAGAKCTBGEAADpJEAYAoJMEYQAAOkkQBgCgkwRhAAA6SRAGAKCTBGEAADpJEAYAoJMEYQAAOkkQBgCgkwRhAAA6SRAGAKCTBGEAADpJEAYAoJMEYQAAOkkQBgCgkwRhAAA6SRAGAKCTBGEAADpJEAYAoJMEYQAAOkkQBgCgkwRhAAA6SRAGAKCTBGEAADrptlEXMExLK2tZPL+aS+sbOT7Xy5lTJ3L65PyoywIAYAQ6E4SXVtZy9tzFbGxeSZKsrW/k7LmLSSIMAwB0UGdaIxbPr14Lwds2Nq9k8fzqiCoCAGCUOhOEL61vHOo6AADTrTNB+Phc71DXAQCYbp0JwmdOnUhvduaGa73ZmZw5dWJEFQEAMEqd2Sy3vSHO1AgAAJIOBeHkahgWfAEASDrUGgEAANcThAEA6CRBGACAThKEAQDoJEEYAIBOEoQBAOgkQRgAgE7aNwhX1X+oqmer6mN7vP5tVfW5qnpy6+cn+18mAAD010EO1PhPSd6S5Bdvcs/vt9Ze25eKAABgCPZdEW6t/V6Szw6hFgAAGJp+9Qh/a1V9tKp+q6pettdNVfVwVS1X1fLly5f79NEAAHB4/QjCTyT5+tbay5P8uyRLe93YWnu0tbbQWls4duxYHz4aAACO5paDcGvt862157Yevz/JbFW99JYrAwCAAbrlIFxVX1dVtfX4FVvv+ZlbfV8AABikfadGVNWvJPm2JC+tqmeS/FSS2SRprb01yeuS/EhVPZ9kI8lDrbU2sIoBAKAP9g3CrbUf2Of1t+TqeDUAAJgYNarF26q6nOR/HeFXX5rkT/tcDpPBd99Nvvfu8t13k++9mwb9vX99a+0FkxpGFoSPqqqWW2sLo66D4fPdd5Pvvbt8993ke++mUX3v/ZojDAAAE0UQBgCgkyYxCD866gIYGd99N/neu8t3302+924ayfc+cT3CAADQD5O4IgwAALdsYoJwVd1ZVR+qqqer6qmqesOoa2LwquorquoPq+qjW9/7z4y6JoanqmaqaqWq3jfqWhieqvpkVV2sqierannU9TAcVTVXVe+uqo9v/W/9t466Jgavqk5s/Wd9++fzVfWjQ/v8SWmNqKrbk9zeWnuiql6S5EKS0621Px5xaQzQ1vHdL26tPVdVs0n+IMkbWmsfGXFpDEFV/YskC0m+qrX22lHXw3BU1SeTLLTWzJLtkKp6R5Lfb629raq+PMmLWmvro66L4amqmSRrSV7ZWjvKWROHNjErwq21T7fWnth6/IUkTyeZH21VDFq76rmtp7NbP5Px/964JVV1R5LvSa/ubbcAAAIdSURBVPK2UdcCDFZVfVWSv5nk7UnSWvt/QnAnfUeS/z6sEJxMUBC+XlXdneRkksdHWwnDsPXn8SeTPJvkA60133s3vDnJjyX50qgLYehakt+pqgtV9fCoi2Eo/lqSy0n+41Y71Nuq6sWjLoqheyjJrwzzAycuCFfVVyZ5T5Ifba19ftT1MHittSuttXuT3JHkFVX1TaOuicGqqtcmeba1dmHUtTAS97XWvjnJdyf5x1X1N0ddEAN3W5JvTvILrbWTSf5vkjeOtiSGaasd5oEkvzHMz52oILzVI/qeJO9qrZ0bdT0M19afyX43yatHXAqDd1+SB7Z6RX81yf1V9c7RlsSwtNYubf37bJL3JnnFaCtiCJ5J8sx1f/F7d64GY7rju5M80Vr7P8P80IkJwlubpt6e5OnW2s+Puh6Go6qOVdXc1uNeku9M8vHRVsWgtdbOttbuaK3dnat/Kvtga+0HR1wWQ1BVL97aEJ2tP41/V5KPjbYqBq219r+TfKqqTmxd+o4kNsN3yw9kyG0RydU/RUyK+5L8UJKLW/2iSfLjrbX3j7AmBu/2JO/Y2kn6ZUl+vbVmlBZMr69N8t6rax+5Lckvt9Z+e7QlMST/NMm7tv5E/j+S/PCI62FIqupFSf5Wkn849M+elPFpAADQTxPTGgEAAP0kCAMA0EmCMAAAnSQIAwDQSYIwAACdJAgDANBJgjAAAJ0kCAMA0En/HzEfuWmKqvfDAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 864x576 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "fig, ax = plt.subplots(figsize=(12,8))\n",
    "ax.scatter(list(X_recovered[:, 0]), list(X_recovered[:, 1]))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "请注意，第一主成分的投影轴基本上是数据集中的对角线。 当我们将数据减少到一个维度时，我们失去了该对角线周围的变化，所以在我们的再现中，一切都沿着该对角线。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "----\n",
    "\n",
    "本文代码更新地址：https://github.com/fengdu78/lihang-code\n",
    "\n",
    "中文注释制作：机器学习初学者公众号：ID:ai-start-com\n",
    "\n",
    "配置环境：python 3.5+\n",
    "\n",
    "代码全部测试通过。\n",
    "![gongzhong](../gongzhong.jpg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
